Characterization of Gradient Dominance and Regularity Conditions for Neural Networks
نویسندگان
چکیده
The past decade has witnessed a successful application of deep learning to solving many challenging problems in machine learning and artificial intelligence. However, the loss functions of deep neural networks (especially nonlinear networks) are still far from being well understood from a theoretical aspect. In this paper, we enrich the current understanding of the landscape of the square loss functions for three types of neural networks. Specifically, when the parameter matrices are square, we provide explicit characterization of the global minimizers for linear networks, linear residual networks, and nonlinear networks with one hidden layer. Then, we establish two quadratic types of landscape properties for the square loss of these neural networks, i.e., the gradient dominance condition within the neighborhood of their full rank global minimizers, and the regularity condition along certain directions and within the neighborhood of their global minimizers. These two landscape properties are desirable for the optimization around the global minimizers of the loss function for these neural networks.
منابع مشابه
Handwritten Character Recognition using Modified Gradient Descent Technique of Neural Networks and Representation of Conjugate Descent for Training Patterns
The purpose of this study is to analyze the performance of Back propagation algorithm with changing training patterns and the second momentum term in feed forward neural networks. This analysis is conducted on 250 different words of three small letters from the English alphabet. These words are presented to two vertical segmentation programs which are designed in MATLAB and based on portions (1...
متن کاملOn Regularity of Acts
In this article we give a characterization of monoids for which torsion freeness, ((principal) weak, strong) flatness, equalizer flatness or Condition (E) of finitely generated and (mono) cyclic acts and Condition (P) of finitely generated and cyclic acts implies regularity. A characterization of monoids for which all (finitely generated, (mono) cyclic acts are regular will be given too. We als...
متن کاملA conjugate gradient based method for Decision Neural Network training
Decision Neural Network is a new approach for solving multi-objective decision-making problems based on artificial neural networks. Using inaccurate evaluation data, network training has improved and the number of educational data sets has decreased. The available training method is based on the gradient decent method (BP). One of its limitations is related to its convergence speed. Therefore,...
متن کاملConjugate gradient neural network in prediction of clay behavior and parameters sensitivities
The use of artificial neural networks has increased in many areas of engineering. In particular, this method has been applied to many geotechnical engineering problems and demonstrated some degree of success. A review of the literature reveals that it has been used successfully in modeling soil behavior, site characterization, earth retaining structures, settlement of structures, slope stabilit...
متن کاملTraffic Signal Prediction Using Elman Neural Network and Particle Swarm Optimization
Prediction of traffic is very crucial for its management. Because of human involvement in the generation of this phenomenon, traffic signal is normally accompanied by noise and high levels of non-stationarity. Therefore, traffic signal prediction as one of the important subjects of study has attracted researchers’ interests. In this study, a combinatorial approach is proposed for traffic signal...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1710.06910 شماره
صفحات -
تاریخ انتشار 2017